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Abstract 

In this paper we undertake the general study of the Abelian complexity of an infinite word 
on a finite alphabet. We investigate both similarities and differences between the Abelian 
complexity and the usual subword complexity. While the Thue-Morse minimal subshift is 
neither characterized by its Abelian complexity nor by its subword complexity alone, we show 
that the subshift is completely characterized by the two complexity functions together. We 
give an affirmative answer to an old question of G. Rauzy by exhibiting a class of words 
whose Abelian complexity is everywhere equal to 3. We also investigate links between Abelian 
complexity and the existence of Abelian powers. Using van der Waerden's Theorem, we show 
that any minimal subshift having bounded Abelian complexity contains Abelian /c -powers for 
every positive integer k. In the case of Sturmian words we prove something stronger: For 
every Sturmian word lo and positive integer k, each sufficiently long factor of u> begins in an 
Abelian k-power. 

1 Introduction 

In this paper we undertake the general study of the Abelian complexity of an infinite word on 
a finite alphabet. Although some of the topics outlined in this paper have already appeared in 
the literature (see iPTTl [T6l l24l ). to date very little is known on the general Abelian theory of 
words. In fact, prior to this paper, some of the key notions had not even been formally defined. 
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Given a finite non-empty set A (called the alphabet), we denote by A*, A N and A z respec- 
tively the set of finite words, the set of (right) infinite words, and the set of bi-infinite words 
over the alphabet A. We endow A m with the topology generated by the metric 

d(x, y) = — where n = M{k : x k ^ y k } 

whenever x = (x n ) n ^jq and y = (y n ) ng N are two elements of A N . Let T : A N — > A N denote 
the s/zzy? transformation defined by T : (x n ) ne ^ i— > (x n+ i) n£ N- By a subshift on A we mean a 
pair (X, T) where X is a closed and T-invariant subset of A N . A subshift (X, T) is said to be 
minimal whenever X and the empty set are the only T-invariant closed subsets of X. 

Given a finite word u = a\a,2 ■ ■ ■ a n with n > 1 and dj G A, we denote the length n of u 
by |u|. The empty word will be denoted by e and we set |e| = 0. For each a G A, we let |ii| a 
denote the number of occurrences of the letter a in u. Two words u and v in A are said to be 
Abelian equivalent, denoted u v, if and only if \u\ a = \v\ a for all a G A It is readily 
verified that defines an equivalence relation on A*. 

Given an infinite word oj = ojqojioj2 . . . G A n with u){ G A we denote by ^^(n) the set 
of all factors of u> of length n, that is, the set of all finite words of the form oJiOJi+i • • • Ui+ n -\ 
with i > 0. We set 

p w {n) = Card (.Fa, (re)). 

The function p^ : N — ► N is called the subword complexity function of w. Given a minimal 
subshift (X, T) on A, we have that ^(n) = .Tv(re) for all cj,u/ G X and n G N. Thus we 
can define the subword complexity P(x,T) ( n ) of a minimal subshift (X, T) by 

P(X,T)(rc) = p u {n) 

for any w G X. 

Analogously we define J 7 ^ (n) = !F w {n)/ ~ a b an ^ set 

^ b (n)=Card(^ b (n)). 

The function p^° : N — > N which counts the number of pairwise non Abelian equivalent 
factors of oj of length n is called the Abelian complexity of u, or ab-complexity for shorfl. As 
in the case of subword complexity, the definition of Abelian complexity naturally extends to 
the context of a minimal subshift. 

In most instances, the alphabet A will consist of the numbers {0, 1, 2, . . . , k — 1}. In this 
case, for each ti G A*, we denote by Hf(u) the Parikh vector^ associated to u, that is 

*(«) = (|«|o,|w|i,|u| 2 ,...,|n| fe _i). 

Given an infinite word oj G A n we set 

* w (n) = \u G ^(n)} 



'A different, yet related notion of Abelian complexity, called Parikh complexity, was considered in lfT51 . 

2 Parikh matrices, an extension of Parikh vectors, were recently introduced to characterize words in terms of the 



occurrences of some scattered subwords (see, e.g., lF2Tl[T3l|29l ). 
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and thus we have p^ (n) = Card^^n)). 

There are a number of similarities between the usual subword complexity of an infinite 
word and its Abelian complexity. For instance both may be used to characterize periodic bi- 
infinite words. Here a word to is periodic if there exists a positive integer p such that uji +p = uji 
for all indices i, and it is ultimately periodic if tOi +p = u>i for all sufficiently large i. An infinite 
word is aperiodic if it is not ultimately periodic. 

Theorem 1. Let uj be a bi-infinite word over the alphabet A. The following properties hold. 

• (M. Morse, G.A. Hedlund, H23\l ) The word lo is periodic if and only if p^{no) < n^for 
some no > 1. 

• (E.M. Coven and G.A. Hedlund, ) The word u is periodic if and only if p^? (n§) = 1 
for some uq > 1. (See also Lemma \2~3\ 

The condition in the first item of the previous theorem also gives rise to a characterization 
of ultimately periodic words by means of subword complexity. Abelian complexity does not, 
however, yield such a characterization. Indeed, both Sturmian words (see below) and the 
ultimately periodic word 01°° = 0111 • • • have the same, constant 2, Abelian complexity. 

As another example, both complexity functions may be used to characterize Sturmian 
words: 

Theorem 2. Let u be an aperiodic infinite word over the alphabet {0, 1}. The following 
conditions are equivalent: 

• The word to is balanced, that is, Sturmian. 

• (M. Morse, G.A. Hedlund, H23\l ). The word to satisfies Pa,(no) = n + lfor all n > 0. 

• (E.M. Coven, G.A. Hedlund, Hll\l ). The word to satisfies p^ '(n) = 2 for all n > 1. (See 
also the discussion in the beginning of Section^)). 

We recall an infinite word uj E A N is said to be C-balanced (C a positive integer) if 
1 1^1 a — |^|a| < C for all a G A and all factors U and V of u> of equal length. A word u> is 
called balanced it it is 1-balanced. 

However, in some cases the two complexities exhibit radically different behaviors. For 
instance, as was originally pointed out to us by P. Arnoux |6j, consider the binary Champer- 
nowne word 

C = 01101110010111011110001001 . . . 

obtained by concatenating the binary representation of the consecutive natural numbers. Let 
u! denote the moiphic image of C under the Thue-Morse morphism p defined by i— > 01 
and 1 i— ► 10. Then while p w {n) has exponential growth, we will see in Theorem 13.31 that 
pf; ( n ) ^ 3 for all n. 

In contrast, there exist infinite words having linear subword complexity and unbounded 
ab-complexity. In fact, in iTTOll . the third author together with J. Cassaigne and S. Ferenczi 
established the existence of a word with subword complexity p(n) = 2n + 1 which is not 
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C-balanced for any positive integer C. An infinite word u G A is said to be C-balanced (C 
a positive integer) if ||Z7| a — |V| a | < C for all a G A and all factors U and V of a; of equal 
length. We prove that 

Lemma 3. p^in) is bounded if and only ifu is C -balanced for some positive integer C. 

Hence the word constructed in iflOl of subword complexity p(n) = 2n + 1 has unbounded 
ab-complexity. 

Finally, in some cases, the two complexity functions work in tandem to give explicit char- 
acterizations of certain minimal subshifts. For instance we prove that 

Theorem 4. Let TMo G {0, 1} N denote the fixed point beginning in of the Thue-Morse 
morphism i— > 01, 1 i— ► 10, and let u be a recurrent infinite word. Then 

Pw{n) = PtMo(^) n G N 

and 

#(n)=A,(n) nen 
if and only ifu is in the subshift generated by TMo- 

Recall that an infinite word is recurrent if each of its factors occurs infinitely often in w. 

Inspired by the last characterization of Sturmian words given in Theorem |2l G. Rauzy 
asked the following question: 

Question 5 (G. Rauzy, [24 ]). Does there exist an infinite word ui whose ab-complexity p% (n) = 
3 for all n > 1. 

He suggests that the likely answer to this question is NO. However, we give two positive 
solutions to Rauzy's question: 

Theorem 6. Let u be an aperiodic balanced word on the alphabet {0, 1,2}. Then p£ ,{n) = 3 
for all n > 1. 

Theorem 7. Let u>' G {0, 1} N be any aperiodic infinite word, and let u be the image of u' 
under the morphism i— > 012, and 1 i— ► 021. Then p**~(n) = 3 for all n > 1. 

In Section [51 we investigate a surprising connection between Abelian complexity and 
avoidance of Abelian powers. By an Abelian k-power we mean a non-empty word of the 
form w = u\U2 ■ ■ ■ Uk where the words Ui are pairwise Abelian equivalent. In this case we say 
w has an Abelian period equal to | u\ | . Powers, or repetitions, occurring in an infinite word is a 
topic of great interest having applications to a broad range of areas (see, e.g., [2l[3j|5j|20]]). One 
stream of research dating back to the works of Thue |[30ll3Tl is the study of patterns avoidable 
by infinite words (see, e.g., 0U [M |25j US 13). In the Abelian context, FM. Dekking US 
showed that Abelian 4-powers are avoidable on a binary alphabet and that Abelian cubes are 
avoidable on a 3-letter alphabet. V. Keranen [17] proved that Abelian squares are avoidable on 
four letters. We prove the following general result relating Abelian complexity and Abelian 
repetitions. 
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Theorem 8. Let uj be an infinite word on a finite alphabet having bounded ab-complexity. 
Then u contains an Abelian k-power for every positive integer k. 

In view of Theorem|2l the previous theorem implies that Sturmian words admit occurrences 
of Abelian -powers for all k. In Section [6] we show that actually these words possess a much 
stronger property: 

Theorem 9. For every Sturmian word uj and every positive integer k, there exist two integers 
£\ and £2 such that each position in u has an occurrence of an Abelian k-power with Abelian 
period t\ or £2. In particular, every Sturmian word begins in an Abelian k-power for all 
positive integers k. 

Acknowledgements The second author is partially supported by grant no. 8121419 from 
the Finnish Academy. The third author is partially supported by grant no. 090038011 from the 
Icelandic Research Fund. 



2 Generalities 

As explained in Section [T] the Abelian complexity p— 3 of a word uj is the function which 
counts the number of pairwise non Abelian equivalent factors of u for each length n. In other 
words, for all n > 0, p^(n) is the cardinality of the set of Parikh vectors of factors of length 
n of uj. 

Let a, b be two letters in A = {0, . . . ,p — 1} and let u be a word over A. If a = b, 
ty(au) = ty(ub). When a ^ b, ty(au) — ^(ub) is the vector whose all entries are except its 
(a + l)-th entry with value +1 and its (b + l)-th entry with value —1. This shows how Parikh 
vectors evolve when considering two successive factors of same length of a word uj. As an 
immediate consequence, we deduce the following fact that will be used very often implicitly. 

Lemma 2.1. If an infinite word uj has two factors u and v of same length nfor which the ith 
entry of the Parikh vector are p and p+c respectively for some p and c then for all £ = 0, . . . , c, 
there exist factors ue of lo whose ith entry is p + £. 

With the notation of this fact, we can see that p^ij 1 ) > c + 1. This implies: 

Lemma 2.2. For a word uj € A N U A z , the function pffi is bounded if and only if uj is C- 
balanced for some positive integer C. 

Proof. If p^ 3 is bounded by K, then uj is (K — l)-balanced. If uj is C-balanced, then entries 
of Parikh vectors of factors of uj can take at most C + 1 values, so that p^(n) < (C + 

-gcard(A)_ q 

A natural question concerns the possible extremal values of the Abelian complexity. As 
shown by next result, minimal values are reached by periodic words. 
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Lemma 2.3 (E.M. Coven and G. A. Hedlund, [11, Remark 4.07]). Let uj G A m UA z bea right 
infinite or a bi-infinite word. Then uj is periodic of period p if and only if p% , (p) = 1. 

Let us recall that a word u is a right special factor of an infinite word uj if there exist 
distinct letters a and (3 such that ua and u/3 are both factors of uj. 

Proof of Lemma \2~3\ The "only if" part is immediate. The other direction can be deduced 
from the observation that a non-periodic word uj must contain arbitrarily long right special 
factors implying p^(n) > 2 for all n > 1. □ 

Concerning the maximal Abelian complexity, it is clear that it is reached by any infinite 
word containing all finite words as factors, as for instance the Champernowne word. Let us 
denote pf^ x the Abelian complexity of such a word. Since, for any word u of length n over a 
fc-letter alphabet, $?(u) is a fc-tuple (ii, . . . , i^) with n = i\+i2+. ■ -+ik, P^ ax * s tne maximum 
number of ways of writing n as the sum of k nonnegative integers. This well-known number 
(see, e.g., 113211 ) is called the number of compositions of n into k parts and and its value is given 
by the binomial coefficient ( n ^^" 1 ). This can be summarized: 

Theorem 2.4. For all infinite words uj over a k-letter alphabet, and for all n > 0, 



In Section Q] we provided an example of an infinite word having an exponential subword 
complexity but a linear Abelian complexity. Here follows an example of a binary infinite word 
having maximal Abelian complexity but a linear subword complexity. Let / and g be the 
morphisms defined by /(a) = abc, f(b) = bbb, /(c) = ccc, g(a) = = g(c) and g(b) = 1. 
The image uj by g of the fixed point of / on a is the word Y\ i>0 l 3 3 \ That p— 3 = p^ x is 
immediate. Since w is an automatic sequence, it has linear subword complexity (see Theorems 
6.3.2 and 10.3.1 in 0). 

3 Abelian Complexity of the Thue-Morse Word 

Let us recall that the celebrated Thue-Morse word is the fixed point TMo beginning in of 
the Thue-Morse morphism p defined by p(0) = 01 and ^(1) = 10. Next theorem gives its 
Abelian complexity: 

Theorem 3.1. p^Jn) = 2 for n odd and p^M (n) = 3 for n / even. 

This result is a direct corollary of Theorem 13.31 below which shows that this complexity 
is only due to the fact that TMo is the image of a word by p. Moreover Theorem 13.31 char- 
acterizes the class of all words having same Abelian complexity as the Thue-Morse word. 




In particular, the ab-complexity is in 0(n' 



)• 
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Corollary |3.5| shows that the subshift generated by the Thue-Morse word can be distinguished 
in this class by its subword complexity 
We need a preliminary result: 

Lemma 3.2. Let uj = u)qu)\uj2 ■ ■ ■ £ {0, 1} N be an aperiodic infinite word. Then for every 
k > 2, there exist factors U and V of uj of length k, with U = Oul and V = lvO for some 
words u and v (possibly empty). 

Proof. Suppose for some k > 2, the aperiodic word uj does not contain a factor of length k 
beginning in and ending in 1. This implies that for each i > 0, if uji = 0, then u)i + k-i = 0. 
Thus for i sufficiently large we have that uj i+ k_i = u>i, and hence uj is eventually periodic, a 
contradiction. Similarly, if u does not contain a factor V of length k beginning in 1 and ending 
in 0, it would follow that uj is eventually periodic. 

□ 

Theorem 3.3. The Abelian complexity of an aperiodic word u> € {0, 1} N is 

J p a }(n) = 2 for n odd, 

\ P^( n ) = 3 for even, 

if and only if there exists a word to' such that u = /z(u/), uj = 0p,(uj') or uj = lfi(ui'). 

Proof. Assume first that the Abelian complexity of a word a; is 2 for n odd and 3 for n ^ 
even. We first prove by induction that for all k > 

f # w (2fc + 1) = {(k + 1, k), (k, k + 1)} and 

\ 9 u (2k + 2) = {(k + 1, k + 1), (k, k + 2), (k + 2, fc)}. 

The result is true for k = 0. Assume that X I / W (2A; + 2) is as above. Since (2k + 3) = 2, by 
Lemma |2~T1 there exist integers p and q such that 

* w (2fc + 3) = {(p + l J ?),(p J g + l)}. 

Moreover 

* w (2fc + 3) C * w (2fc + 2) + {(l,0),(0,l)} 

= {(fc + 2, fc + 1), (fc + 1, fc + 2), (fc, fc + 3), (k + 3, fe)}. 

Thus exactly one of the following three identities hold: 

^(2£; + 3) = {{k + 2, k + 1), (fc + 1, k + 2)}, 
tt w (2fc + 3) = {(fe + 2, + 1), (fc + 3, fe)}, 
* w (2fc + 3) = {(k + 1, k + 2), (fc, fc + 3)}. 
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The second identity is not possible, though, because (k, k + 2) S ^ UJ {2k + 2). Similarly we 
see that the third identity cannot hold, and therefore the first one does. Now 

* u (2k + 4) C ¥ w (2fc + 3) + {(l,0),(0,l)} 

= {{k + 3, k + 1), (k + 2,k + 2), (fc + 1, k + 3)}. 

Since p^(2k + A) = 3, the previous inclusion is an equality, and the inductive part is proved. 

It follows from ^(3) = {(1, 2), (2, 1)} that 000 and 1 1 1 do not occur in w. Assume that 
lo has one factor of the form 11(01) Oil. This word has Parikh vector (k + 4, k + 1) which is 
not possible from what precedes. Hence co has no factor of the form ll(01) fc 011 and similarly 
no factor of the form 00(10) fc 100 (that is between two consecutive occurrences of 11 there is 
an occurrence of 00 and between two occurrences of 00 there is an occurrence of 11). Thus, 
since lo is aperiodic, both 00 and 11 must occur infinitely often, so that lo can be decomposed 
pu x u 2 ...u k ... with each u { in 1(01)*00(10)*1 and p a suffix of a word in 1(01)*00(10)*1. 
This shows that lo = [i(u>'), lo = 0fi(u>') or lo = l/i(u/) for a word lo'. 

Assume from now on that lo = p{lo'), lo = 0//(u/) or lo = l/x(u/) for a word lo'. 
Let n = 2k + 1. Then every factor U of lo of length n is either of the form p{u)a or ap(u) 
for some factor u of lo' of length k and some a € {0, 1}. It follows that 

* w (n) C {(k + l,k), (k, k + 1)}. 

Since lo is aperiodic, by Lemma 1231 we have that p^(n) > 2 and hence p^(n) = 2. 

Next suppose n = 2k. Then every factor U of length n is either of the form p(u) or ap(v)b 
for some factors u and v of lo' of length k and k — 1 respectively, and for some a, 6 G {0, 1}. 
It follows that 

* w (n) C {(fe, fc), (fc - 1, fe + 1), (fc + 1, fc - 1)}. 

It follows from Lemma 1331 that lo' contains factors u' and v' of length k — 1 with n' preceded 
by and followed by 1 and v' preceded by 1 and followed by 0. Thus lo contains the factors of 
length n of the form lp{u')l and 0fi(v')0, whose corresponding Parikh vectors are (k — 1, k + 
1), and (k + 1, k — 1) respectively. Hence 

M") = (fc - 1, + 1), (fc + 1, * - 1)} 

whence /0^( n ) = 3. 

□ 

Remark 3.4. The proof of the "only if" part of Theorem 13.31 easily extends to ultimately 
periodic words as follows. If an ultimately periodic word lo has the same Abelian complexity 
as the Thue-Morse word, then lo can be decomposed into pu\ . . . Uk(0l)°° or pu\ . . . ui i (W) QC 
with words p and m as in the proof of Theorem [33] (By the notation u°° we mean the infinite 
word uuu • • • ) That is, lo is of the form n(vb°°) or ap(vb°°) with a, b G {0, 1} and for some 
finite word v. 



8 



The converse part of Theorem [33J however, does not extend to ultimately periodic words. 
Indeed, the word (01)°° = ju(0°°) is of the correct form, but it does not have the same Abelian 
complexity as the Thue-Morse word. 

We end this section with: 

Corollary 3.5. Let TMo G {0, 1} N denote the Thue-Morse word beginning in 0, and let lo be 
a recurrent infinite word. Then 

Pu>{n) = PTM (n) n G N 

and 

pZ\n)=pf mo (n) nGN 
if and only if 'lo is in the subshift generated by TMo- 

Proof. The "only if" part is clear. For the converse, we recall that in HI it is shown that every 
recurrent infinite word lo G {0, 1} N whose subword complexity is equal to that of TMo is 
either in the shift orbit closure of TMo or is i n the shift orbit closure of A (TMo) where A 
is the letter doubling morphism defined by i— > 00 and 1 i — > 1 1 . However, if lo G {0, 1} N is 
in the shift orbit closure of A(TM ), then it would contain the 4 factors 111, 110, 100, 000 of 
length 3, and hence p^(3) = 4. Thus lo is in the shift orbit closure of TMo- 

□ 



4 Two Answers to a Question of G. Rauzy 

Recall that a Sturmian word is an aperiodic binary balanced word. Among the few existing 
families of words characterized using Abelian complexity, the following one concerning Stur- 
mian words is the earliest one we knowJl It is essentially due to E.M. Coven and G.A. Hedlund 
(an easy consequence of Lemma 4.02 and remark 4.07 in iTTTi ). 

Theorem 4.1 (E.M. Coven, G.A. Hedlund). Let lo be a binary right infinite word. Then lo is 
aperiodic and balanced (i.e., a Sturmian word) if and only if p^(n) = 2 for all n > 1. 

Inspired by this characterization of Sturmian words, Rauzy asked whether there exist ape- 
riodic words on a 3-letter alphabet such that p a ^(n) = 3 for all n > 1 (see section 6.2 in |241). 
Our next two results provide two different solutions to Rauzy 's question. 

Let p > 3 be any integer, let lo' be any Sturmian word over {0, 1} and let lo = (p — 
l)(p — 2) . . . 2lo' (lo is written over the alphabet {0, 1, . . . , (p — 1)}. As a consequence of 
Theorem 14- 1 1 we can see that p^(n) = p for all n > 1 (in particular when p = 3). This 
provides one answer to Rauzy 's question. Nevertheless it is not completely satisfactory since 

3 I. Kabore and T. Tapsoba also characterized using Abelian complexity the family of so-called quasi-Sturmian by 
insertion words (a subclass of the class of infinite words over a three-letter alphabet having subword complexity n + 2) 
|fT6l . These words defined over a ternary alphabet verify p^°(n) — 2 for n^0 even and p^°(n) = 4 for 1 odd. 
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w is not recurrent. The next two results give two distinct classes of uniformly recurrent words 
(each factor occurs infinitely often with bounded gaps) having constant ab-complexity equal 
to 3. 

Theorem 4.2. Let uj be an aperiodic uniformly recurrent balanced word on the alphabet 
{0, 1, 2}. Then p a }{n) = 3 for all n > 1. 

Proof. Let uj be an aperiodic uniformly recurrent balanced word on the alphabet {0,1,2}. 
P. Hubert showed that up to word isomorphism, uj is obtained from a Sturmian word x G 
{x, y} N by replacing the subsequence of x's in x by the periodic word (01)°° and the subse- 
quence of y's in x by the periodic word 2°° (see Theorem 1 in lfl4l ). In particular we have 
that 

1. Deleting all occurrences of 2 in uj gives the periodic sequence (01)°°. 

2. h{uj) = x where h : {0, 1, 2} — > {x, y} is the morphism defined by h(0) = h(l) = x, 
and h{2) = y. 

3. If a € {0, 1} and ua and va are distinct factors of uj, then h{u) ^ h(v). 

Item 3 follows from item 1 together with the fact that if h(u) = h(v), then either u = v, or 
|u| = \v\ and every occurrence of (respectively 1) in u is an occurrence of 1 (respectively 0) 
in v (that is u = (01) fc 0, v = (10) fc l for some k > 0). 

Since uj is a balanced word, we observe that if it has Parikh vectors (i, j, k) and 
(i + 1, j — l,k) for some integers i,j, k then the only other possible vectors in ^(n) are 
(i + l,j,k — 1) and — l,k + 1), and these two vectors cannot occur simultaneously 
in ^(n). Consequently p^( n ) — 3 for all n > 1. In what follows, we will show that 
p5 b ( n ) > 3 for all n > 1. 

Applying Theorem 2 in lfT4l . we deduce that Puj(n) = 2{n + 1) for all n sufficiently 
large. This in turn implies that for every n > 1, either Case 1: uj contains a (right special) 
factor u of length n such that each of nO, ul, u2 are each factors of uj, or Case 2: uj contains 
two (right special) factors u and v of length n such that ua, ub, vc, vd are each factors of uj 
for some a, b,c,d £ {0, 1, 2} with a ^ b and c ^ d. In Case 1, by considering the Parikh 
vector associated to the suffix of length n of each of uO, ul, u2, we obtain three distinct Parikh 
vectors, and hence p^(ra) > 3 

We next consider Case 2. In this case we will show that each of 0,1,2 is in {a, b, c, d}. 
Assume to the contrary that 2 ^ {a, b, c, d}. Then {a, b, c, d} = {0, 1}. But then by item 1 
above, neither nor 1 occur in u and in v, and thus u = 2 n and v = 2 n contradicting that 
u ^ v. Thus 2 € {a,b,c,d}. Next suppose that ^ {a, b, c, d}. Then u\,u2,v\,v2 are all 
factors of y. It follows by item 3 above that h(u) ^ h{v). But as h(l) ^ h{2), it follows that 
h(u) and h{v) are distinct right special factors of x of length n, contradicting that a Sturmian 
word has exactly one right special factor of every length. Hence, we have G {a,b,c,d}. 
Similarly we deduce that 1 G {a, b, c, d}. 

Having established that {a, b, c, d} = {0, 1, 2}, by considering the Parikh vector of u, v, 
and of the suffix of length n of each of ua, ub, vc, vd, we obtain at least three distinct Parikh 
vectors, and hence p^( n ) ^ 3. 
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□ 



Theorem 4.3. Let ui' € {0, 1} N be any aperiodic infinite word, and let uj be the image of uj' 
under the morphism f defined by i— > 012, and 1 i— > 021. T/zen p^(n) = 3 for all n > 1. 

Proof. We first note that: 

1. For every factor {7 of w, there exist a suffix a; (possibly empty) of 012 or of 021, a prefix 
y (possibly empty) of either 012 or of 021, and a factor u (possibly empty) of u/, such 
that U = xf(u)y. 

2. By Lemma [3721 for every suffix x of 012 (respectively 021) and for every prefix y of 021 
(respectively 012), and for every k > 0, there exists a factor u of uj' of length k such that 
xf{u)y is a factor of of length 3A; + \x\ + \y\. 

In order to prove that (n) = 3 we consider separately the case where n = (mod 3), 
n = 1 (mod 3), and n = 2 (mod 3). In each case, we simply count the number of distinct 
Parikh vectors associated to words of the form xf(u)y with x and y as in item 1 above and 
with \x\ + |y| = n (mod 3). In each case this gives that f^{n) < 3. Similarly, by counting 
the number of distinct Parikh vectors associated to words of the form xf(u)y with x and y as 
in item 2 above and with \x\ + \y\=n (mod 3), we find that p^(n) > 3. 

□ 

Having answered Rauzy's original question, we may now ask: 

Open problem 1. Does there exist a recurrent infinite word w with Pu(n) = 4 for every 
n > 1? 

We suspect that the answer is NO! Using Hubert's main characterization of uniformly 
recurrent aperiodic balanced words given in [14], it can be shown that if p^{n) = 4 for every 
n > 1, then u> is not balanced. While we suspect that the answer to the above question is no, 
Aleksi Saarela E71 recently showed that for every positive integer k, there exists an infinite 
word whose Abelian complexity at n equals k for all sufficiently large n. 

5 Bounded Abelian Complexity 

By a well-known result of V. Keranen lfi~7l . Abelian squares are avoidable over quaternary 
alphabets. This is to be contrasted with the following result, which says that avoiding Abelian 
powers while keeping the Abelian complexity bounded is impossible. 

Theorem 5.1. Let u be an infinite word on a finite alphabet having bounded ab-complexity. 
Then uj contains an Abelian k-power for every positive integer k. 

Remark 5.2. Unlike with Abelian complexity, low subword complexity does not guarantee 
existence of repetitions. For example, the Fibonacci word does not contain any 4th powers 11221 . 
yet its subword complexity is minimal amongst all aperiodic infinite words. 
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We begin proving Theorem 15. II with the following lemma: 

Lemma 5.3. Let M and r be positive integers. Then there exist positive integers ol\ , a.?, , . . . , a r 
and N such that whenever 

r 

^2 c i a i = ( mod N ) 
i=l 

for integers a with |cj| < M for 1 < i < r, then c\ = ci = ■ ■ ■ = Cj. = 0. 
Proof. We define on inductively as follows: Set a\ = 1, and for i > 1, choose 

i 

a i+1 > M y~]atj. 

3=1 

Let N be any integer with N > M Y7j=i a j- Now suppose that 

r 

^2 c i a i = ( mod N ) 
i=l 

with each |c$| < M. Then 

r r r 

i Ciai \ - \ ci \ ai ^ M E a,<iV ' 
i=i «=i i=i 

Hence 

r 

ci«i = o. 

i=i 

To see that each q = 0, suppose to the contrary that some Cj ^ 0, and let i denote the largest 
positive integer 1 < t < r for which q 7^ 0. If i = 1, then we have c\ = c\ol\ = 0, a 
contradiction. Otherwise, if t > 1, 

t-i t-i t-i 

|ct«t| = I CjOfjl < Y \°i\ a i - M Y 0i <a t- \ C t a t\, 

i=l i=l i=l 

a contradiction. 

□ 

We now recall the following well-known result due to van de Waerden: 

Theorem 5.4 (van der Waerden's theorem - see, e.g., chapter 3 in lfT8l ). If N is partitioned into 
k classes, then one of the classes contains arbitrarily long arithmetic progressions. 

Proof of Theorem \5J\ Let r = Card(j4). By hypothesis there exists a positive integer M such 
that (n) < M for every n > 1. Thus for every (ai, 02, . . . ,a r ),(bi,bz, ■ ■ ■ , K) G ty u {n), 
we have |o* — bi\ < M for 1 < i < r. Let a\, 02, ■ ■ ■ ,a r and N be as in Lemma 1531 Up 
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to word isomorphism, we may regard to = l^u^u^ ... as an infinite word on the alphabet 

{a\ , a.2-, ■ ■ ■ , «r}- For each 1 < i < j we set ojuj] = u>i. . .uij, and 

= Ui H \-Uj. 

Now consider the following function: 

^:{l,2,3,...}^{0,l,...,iV-i} 

defined by 

u(t) = ^u>[i )t ] (modiV). 

By van der Waerden's Theorem, for every positive integer k, there exist positive integers to 
and s such that 

v(t ) = u(t + s) = v(t + 2s) = • • • = v(to + ks). (1) 
For each 1 < j < k, set 

\j] 

Then by © 

^^=0 (modiV) (2) 

for each 1 < j < k. 

We will show that = for every 1 <j< fc.Set*(w^) = (of 1 ,^ 1 ,...,^ 1 ). 

By© 

r r 

a[ i] a, = ^ af 1 Oj (mod N) , 
i=l i=l 

and hence 

r 

J^(aJ i] - a| 1] )oj = (mod iV). 
i=l 

Moreover as la^'l | = | for each 1 < j < k, we have that |a| J ' — aj 1 ^ | < M. By Lemma 1531 
we deduce that cq — a[ = and hence that \l/(u;^) = ^(i^M) for every 1 < j < k. Thus 
the factor o^W 2 ! • • • lo^ is an Abelian k- power of lo. 

□ 

Corollary 5.5. Let k be a positive integer and lo an infinite word which avoids Abelian k- 
powers. Then lo is arbitrarily imbalanced, that is, to is not C balanced for any positive integer 
C. 

Proof. This follows immediately from Theorem 15. II together with Lemma l2.2l □ 
Theorem [5jJ naturally gives rise to the following question. 

Open problem 2. Does every uniformly recurrent infinite word with bounded Abelian com- 
plexity begin with an Abelian fc-power for each positive integer fc? 
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This problem seems difficult to solve even for some very well-known words. For example, 
it can be shown that every shift of the Tribonacci word begins in an Abelian -power for all 
k, but we do not know if this is true for every word in its orbit closure. It can also be shown 
that every shift of the Thue-Morse word begins in an Abelian 6-power, but again, we do not 
know whether this holds for larger Abelian powers or for every word in the orbit closure of the 
Thue-Morse word. In the next section, however, we show that the question does hold true in 
the case of Sturmian words, even in a very strong form. 

6 Abelian repetitions in Sturmian words 

In this section, we prove the following theorem, thereby answering Open Problem [2] in the 
affirmative in the case of Sturmian words. 

Theorem 6.1. For every Sturmian word uj and every positive integer k, there exist two integers 
t\ and £2 such that each position in uj begins in an Abelian k-power U\U<i . . . U\. with Abelian 
period i\ or £2, that is [ C/j | G {£% , £2 } • In particular, every Sturmian word begins in an Abelian 
k-power for all positive integers k. 



Remark 6.2. Theorem 16.11 should be contrasted with the fact that there exist Sturmian words 
whose initial critical exponent is equal to 2 (see [9]). We also note that the existence of two 
Abelian periods in Theorem 16- 1 1 is optimal in the sense that any word with the property that 
every position starts with an Abelian A;-power with a fixed Abelian period m is necessarily 
ultimately periodic. 

Let us recall (H that an infinite word is Sturmian if and only if its set of factors coincides 
with that of a characteristic word. A characteristic word is an infinite word c a , depending on 
an irrational number a with < a < 1, such that 

c a (n) = \pt(n + 1)J — [an\ 

for all n > 1. Equivalently the characteristic word c a can also be defined by 

c a (n) = R n (a) (n>l), 

where 

R n {a) -- 

For positive integers i < j we put 

c a [i,j] = c a (i)c a (i + 1) • • • c a (j). 

In the proof of the next lemma, we use the well-known fact that, for all real numbers x, y, a, 
we have 

1/ , \ / . \i Jim -mi ° r m 

\{x + a} - {y + a}\ = < 1 (3) 

U - M - M • 



if {na} < 1 — a; 

1 otherwise. 
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Lemma 6.3. Suppose that a is irrational with < a < 1/2. Suppose also that i,j > 1 are 
integers such that \ {ia} — {ja}\ < a. If k > 1 is an integer such that 

R l+k (a) = R 1+k (a), 

then the words 

Ca[i,i + k] and c a [j, j + k] (4) 

are Abelian equivalent. 

Proof. In Eq. (@]), denote the word on the left by W% and the one on the right by Wj. Since 

R i + k (a) = Ri +k (a), we have 

\{(i + k)a} — {(j + k)a}\ < max{a, 1 — a} = 1 — a. 
Hence from | {ia} — {ja}\ < a and Eq. (0, we deduce that 

\{ia} - {ja}\ = + k)a} - {(j + k)a}\. 
Hypothesis R l+k (a) = Ri +k (a) also implies that 

+ k)a} - {{j + k)a}\ = + k + l)a} - {(j + k + l)a}| 

and so 

\{(i + k + l)a} - {(j + k + l)a}\ < a. 
The number of letters 1 in Wi is given by 

k k 

\W l \ 1 = ^c a {i + h) =^2([(i + h+l)a\ - [(i + h)a\) = [(i + k + l)aj - [ia], 

h=0 h=0 

and similarly for Wj. These give 

ll^'li-l^lil = |(L0' + fc + l)oJ " LJ«J) - {[(i + k + l)a\ - [ia\)\ 
= | [ia\ - [ja\ + L(j + k + l)aj - [(i + k + l)aj | 
= \ia — {ia} — ja + {ja} + (j + k + l)a— 

{(j + k + l)a} - (i + fc + l)a + {(i + A; + l)a}\ 

< |{J«} - + |{(« + * + l)a} - {(i + * + 1)«}| 

< 2a < 1. 

Consequently, Wi and Wj are Abelian equivalent. □ 

In the proof of the next theorem, we will use the following fact: for all real numbers x, y 
and integers p > 1, we have 

{x + py} = {x} + p{y} - {p-q), (5) 
where q is the integer for which 

p-q< {x}+ p{y} <p-q+l. 
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Theorem 6.4. Let a be irrational with < a < 1. For all positive integers k and i, there is 
an Abelian k-power occurring at position i in c a . 

Proof. It is a well-known fact that ci_ a is obtained from c a by exchanging letters and 1 (see 
(H Cor. 2.2.20]). Therefore, without loss of generality, we may suppose that a < 1/2. To 
show that there exists an Abelian k -power at the ith position in c a , let us choose a real number 
5 with 

< 5 < a. (6) 

We have two cases to consider: 
Case 1. We have 

< {ia} < a — 5, or a < {ia} < 1 — 5. (7) 

None of these intervals is empty by Eq. © and since a < 1/2. 
Let I be a positive integer such that 

{la} < 5/k. (8) 
This choice of £ and the inequality {ia} < 1 — 5 imply that, for all < j < k, we have 

{ia}+j{£a}<l-^^-<l, 

so from Eq. ([5]> we get 

{(i + j£)a} = {ia}+j{£a}. 

This and Eq. ([8]) give 

{ia} < {{i + j£)a} < {ia} + 5 (9) 
for all < j < k. These imply that if {ia} < a — 5, then we have 

R^-\a) = R i+2l -\a) = ■■■ = R l+M -\a) = 1. 
Similarily, if {ia} > a, inequalities Q imply 

R i + i -\a) = R i+2e ~\a) = ■■■ = R i+M -\a) = 0. 
By these observations, Lemma |631 says that the k words 

c a [i + j£,i + {j + !)£-!} (0<j<k-l) 

of length £ are Abelian equivalent. Therefore c a [i, i + k£ — 1 ] is an Abelian A; -power. 
Case 2. We have 

a - 5 < {ia} < a, or 1 - 5 < {ia} < 1. (10) 
Now we choose a positive integer £ such that 

1 _ ?LZ1 < {£ a } < i. (ii) 
k 
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Since {ia} > a — 5, we have, for all < j < k, 

1 + J > {ia}+j{ta} > J + {k -^ a ~^ > j. 

k 

Hence by Eq. (f5]), 

{{i + j£)a} = {ia}-j(l-{£a}), 
and so the lower bound for {£a} in Eq. (fTTTt gives 

{ia} -(a -5) < {(i + jl)a} < {ia} (12) 

for all < j < k. 

Now if {ia} > 1 — 5, we have 

{(i + j£)a} > 1 - a > a 

for all < j < k, and this gives 

tf+t-^a) = R i+2t ~ l {a) = ■■■ = R i+M -\a) = 0. 
Otherwise {ia} < a, and we have 

R l+l -\a) = R i+2l ~\a) = ■■■ = R i+M -\a) = 1. 
By these equations and Eq. (fT2l . Lemma [631 implies that the k words 

c a [i+j£,i + (j + l)£-l] (0<j<k-l) 
of length £ are Abelian equivalent, and so the word c a [ i, i + k£ — 1 ] is an Abelian A;-power. 

□ 

Let us denote by a = [0; a%, a>2, 03, • • • ] the continued fraction expansion of a. For n > 0, 
denote 

— = [0;ai, . . . ,a n J, 

where gcd(q n ,p n ) = 1. We will need the next two basic properties of continued fractions. 

(1) We have q n +i > q n for all n > 1. 

(2) If n > is even, we have 

0<a-^< — - — ; (13) 

and if n is odd, we have 

0< — -a< — - — . (14) 

Qn QnQn+1 
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Proof of Theorem \6J] Let a denote the slope of the Sturmian word u>. The set of factors of u 
coincides with that of the characteristic word c a , so it suffices to prove the claim for c a . 

As in the proof of Theorem 16.41 we assume that a < 1/2 and let 5 be a real number with 
< 5 < a. As lim n _ +00 q n = +oo there exists an even integer n > such that 

k 

q n +i > 



min{<5, a — 5} 
Eq. ([T3l then implies that 

1 min{<5, a — 5} 5 

{q n a} = q n a - p n < < ; < -. 

q n+1 k k 

Consequently, if {ia} is contained in one of the intervals of Case 1 in the proof of Theorem l6.4l 
we may choose £ = q n , and then the word c Q [ i , i + ki — 1 ] is an Abelian fc-power with Abelian 
period q n . 

If {ia} is in one of the intervals of Case 2, we may choose £ = q n +i- Indeed, then by 
Eq. (fl4l) . we have 

11 min{<5, a — 5} 

1 - {Qn+ia} = Pn+i ~ aq n+1 < < < . 

q n +2 q n +\ k 

That is 

-i Q ~ ^ r ii 
1 — < {q n +ia\ < 1. 

Now by the proof of Case 2, the word c a [i, i + k£ — 1 ] is an Abelian k -power with Abelian 
period q n +i, the claim follows. □ 



Remark 6.5. The property of Sturmian words given in Theorem 16. II does not provide a char- 
acterization of Sturmian words. For instance, this same property is also satisfied by any word 
of the form f(uj) with lo a Sturmian word and where / is the morphism defined by /(0) = 00 
and/(l) = 01. 



Remark 6.6. According to the terminology introduced in [28"], Sturmian words are everywhere 
Abelian repetitive. 
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